dis 1
Why Spectral Normalization Stabilizes GANs: Analysis and Improvements
Lin, Zinan, Sekar, Vyas, Fanti, Giulia
Spectral normalization (SN) is a widely-used technique for improving the stability of Generative Adversarial Networks (GANs) by forcing each layer of the discriminator to have unit spectral norm. This approach controls the Lipschitz constant of the discriminator, and is empirically known to improve sample quality in many GAN architectures. However, there is currently little understanding of why SN is so effective. In this work, we show that SN controls two important failure modes of GAN training: exploding and vanishing gradients. Our proofs illustrate a (perhaps unintentional) connection with the successful LeCun initialization technique, proposed over two decades ago to control gradients in the training of deep neural networks. This connection helps to explain why the most popular implementation of SN for GANs requires no hyperparameter tuning, whereas stricter implementations of SN have poor empirical performance out-of-the-box. Unlike LeCun initialization which only controls gradient vanishing at the beginning of training, we show that SN tends to preserve this property throughout training. Finally, building on this theoretical understanding, we propose Bidirectional Spectral Normalization (BSN), a modification of SN inspired by Xavier initialization, a later improvement to LeCun initialization. Theoretically, we show that BSN gives better gradient control than SN. Empirically, we demonstrate that BSN outperforms SN in sample quality on several benchmark datasets, while also exhibiting better training stability.
Common and Discriminative Subspace Kernel-Based Multiblock Tensor Partial Least Squares Regression
Hou, Ming (Laval University) | Zhao, Qibin (RIKEN Brain Science Institute and Shanghai Jiao Tong University) | Chaib-draa, Brahim (Laval University) | Cichocki, Andrzej (RIKEN Brain Science Institute)
In this work, we introduce a new generalized nonlinear tensor regression framework called kernel-based multiblock tensor partial least squares (KMTPLS) for predicting a set of dependent tensor blocks from a set of independent tensor blocks through the extraction of a small number of common and discriminative latent components. By considering both common and discriminative features, KMTPLS effectively fuses the information from multiple tensorial data sources and unifies the single and multiblock tensor regression scenarios into one general model. Moreover, in contrast to multilinear model, KMTPLS successfully addresses the nonlinear dependencies between multiple response and predictor tensor blocks by combining kernel machines with joint Tucker decomposition, resulting in a significant performance gain in terms of predictability. An efficient learning algorithm for KMTPLS based on sequentially extracting common and discriminative latent vectors is also presented. Finally, to show the effectiveness and advantages of our approach, we test it on the real-life regression task in computer vision, i.e., reconstruction of human pose from multiview video sequences.
- Asia > China > Shanghai > Shanghai (0.04)
- Africa > Senegal > Kolda Region > Kolda (0.04)
- North America > Canada > Quebec (0.04)
- (3 more...)
- Health & Medicine > Health Care Technology (0.93)
- Health & Medicine > Therapeutic Area > Neurology (0.68)
- Health & Medicine > Diagnostic Medicine > Imaging (0.46)